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Abstract 

Confessions pages have grown popular on social media sites 
such as Facebook and Twitter, particularly within college 
communities. Such pages allow users to anonymously sub¬ 
mit confessions related to collegiate experience that are sub¬ 
sequently broadcast on a public forum. Because of the anony¬ 
mous nature of disclosure, we believe that confessions pages 
are novel data sources from which to discover trends and is¬ 
sues in a collegiate community. Aggregating data from more 
than 20,000 entries posted to one such space, we analyze nat¬ 
ural language characteristics of the originating community 
with LDA, pointwise mutual information and sentiment anal¬ 
ysis. Using a Markov topic model, we identify the latent top¬ 
ics in our corpus and find that loneliness is a highly regular 
pattern. Our findings on student confession communities sup¬ 
port previous sociological research, contextualizing student 
loneliness in the age of social networks. 


Introduction 

In the age of Social Network Sites (SNS), university stu¬ 
dents often utilize anonymous confessions pages on Face- 
book to vocalize issues and topics they otherwise would not 
share under their true identity. Acquired through an exter¬ 
nally linked survey, confessional messages are filtered by 
page administrators and then anonymously posted on a pub¬ 
lic Facebook community page, where page followers can 
comment and respond ( [Simon 2015| l. One student anony¬ 
mously writes to a confessions page: 

“I honestly feel like a failure. I spent all semester look¬ 
ing for an internship, applied to dozens of places, got 
interviewed at only one and did not get it. If I cannot 
get even one, how will I be able to find a real job senior 
year?” 

Another student writes on the campus page for University 
of Colorado, Boulder: 

“I think I’m a Republican now, and I can’t tell any of 
my friends or anyone else in Boulder, really.” 

Through personal anecdotes, these pages offer insight into 
“true” campus culture as well as student sociological and po¬ 
litical sentiment. Moreover, they propagate unprecedented 
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volumes of such textual data - between 2011 and 2013, 
some community pages have individually posted more than 
17,000 anonymous messages ( Simon 2015| l. A university 
campus’s confessions page is a novel corpus from which 
we can begin to infer trends within a collegiate cohort that 
would otherwise be difficult to detect through unsolicited 
and official surveys (Wang, Burke, and Kraut 2013). To fur¬ 
ther motivate this study, patterns in such student discourse 
could inform administrators about their respective student 
bodies and draw general attention to prominent issues or 
topics in such communities. Confessions pages have already 
recently raised awareness from campus administrators about 


specific incidents necessitating institutional intervention (Si- 
jmon 2015] ). In this paper, we employ a textual approach to 
analyze information disclosed by users of confessions pages. 
Full anonymity is known to have a positive effect on this pro¬ 
cess of self-disclosure ( |Keipi, Oksanen, and Rasanen 2015| l. 

One recent study examined how SNS users access emo¬ 
tional support by broadcasting requests for advice/resources 


via status updates to their entire network ( Ellison et al. 
2013|l. However, users are unlikely to use such a public 


channel to share embarrassing or graphic experiences and 
issues - English-speaking users mostly tend to publicly post 
positive updates ([Kramer and Chung 201 1|. Prior research 
also suggests that anonymous social media sharing is more 
likely with controversial content ( jZhang and Kizilcec 2014| l. 
Therefore, an anonymous campus confessions page may be 
a more socially acceptable forum for university students 
to discreetly share negative campus experiences. Perhaps, 
these posts are framed as ‘cries for help’ instead of simply 
‘calls for support’. In addition, users have access to a broader 
audience of their campus peers, instead of their immediate 
social network. 

Given the advantages of anonymous confessions pages as 
a novel data source for monitoring issues, in this paper, we 
aimed to answer a general research question: What does a 
particular student community talk about on a confessions 
page? 


Proposed Method 

Dataset 

We collected approximately 24,000 anonymous posts pub¬ 
lished on the Facebook page, “Tufts Confessions”, from late 






















Phrase 

freq 

I want to 

732 

I feel like 

694 

I just want 

314 

I wish I 

306 

the fact that 

261 

all the time 

230 

in love with 

216 

every time I 

189 


Table I: Top trigrams ranked by frequency. 


Phrase 

PMI 

socio economic background 

28.21 

spectrum bi curiosity 

28.01 

prescription ADD medication 

27.87 

fossil fuel companies 

27.69 

Israeli Palestinian conflict 

26.53 

competent natural ability 

26.78 

beginning anti depressant 

24.93 

false rape accusation 

24.57 

segregated culture house 

24.08 

binge eating disorder 

23.98 

struggle struggle struggle 

23.52 

social justice warrior 

22.71 


Table II; Top trigrams ranked by PMI. 


2013 to early 2015. For this study, using a single campus 
page allowed for coordination with individual page admin¬ 
istrators. Focusing on a particular ‘community sample’ we 
can link our analysis back to a single, known student com¬ 
munity. Moreover, this particular page offered a much larger 
corpus than other sparser pages. 

This dataset consisted of publicly visible messages as 
posted by the page administrators. All messages were fil¬ 
tered by the administrators to confirm their origin from and 
relevance to the Tufts University community. 

To preprocess the data, we removed all non-essential 
stopwords (e.g. “the”, “of”) from each post according to 
the Natural Language Toolkit. URL syntax (e.g. “http”, 
“youtube”) was also removed. After pre-processing, we re¬ 
tained 245,260 of the original 991,297 tokens, with a vocab¬ 
ulary set of 25,055 unique words. 


Natural language processing tools 

To explore some general characteristics about our corpus, 
we employed Pointwise Mumal Information (PMI). PMI is 
an information theoretic metric that can be used to discover 
high entropy bigrams (word pairs) and trigrams (triplets) in 
a text corpus ( Tomokiyo and Hurst 2003| l. Phrases with high 
PMI are often very specific and frequently occurring vernac¬ 
ular or linguistic constructions across a set of documents. 

In order to understand the topics discussed in “Tufts Con¬ 
fessions”, we used Latent Dirichlet Allocation (LDA), a ma¬ 
chine learning algorithm to segment a text corpus into dis¬ 
crete topics (Blei, Ng, and Jordan 2003) . The fundamental 
assumption of the LDA model is that a document is an or¬ 
derless “bag of words,” from which clusters of frequently 
co-occurring words across all documents give meaning to 
“topics”. 

We are not only interested in the separate topics in our 
corpus, but also the relationships between the topics them¬ 
selves and the distribution of topics in one post. To model 
topic transitions within a post, we turned to the Hidden 
Markov model which augments the LDA “bag of words” 
assumption by suggesting that post topics form a Markov 
chain (Seymore, McCallum, and Rosenfeld 1999| l. The Hid¬ 
den Markov model is suitable for discovering sequential 
syntactic structure in language (Seymore, McCallum, and 


[Rosenfeld 1999| l. The average post in our corpus is 35 words 
(including stopwords), so it is likely that a single post ex¬ 
plores a sequence of multiple topics. This assumption al¬ 
lows us to thematically categorize keywords, and recognize 
global topic patterns. 

Experimental Results 

First, we report the top trigrams in “Tufts Confessions” 
ranked by raw frequency alone in Table |I] These phrases 
serve to illustrate the egocentric syntactic construction of 
Internet confessions. Confession posts typically express per¬ 
sonal desire or sentiment, and notably often recount repeated 
experiences (e.g. “every time I”, “all the time”). 

In comparison, to find popular phrases that were also 
likely distinct to the Tufts student community, we extracted 
the top trigrams across all documents according to PMI (Ta¬ 
ble [n|i. We only reported trigrams that met a minimum fre¬ 
quency of occurrence in our corpus as to ensure that our tri¬ 
grams didn’t measure high in PMI just because of the rare 
overall occurrence of word triplets (e.g. nonsense phrases, 
proper nouns, non-English phrases). Our resulting set of 
phrases were relatively frequent, and less generic word seg¬ 
ments. 

We can observe that phrases in this latter group resem¬ 
ble political, economic and racial discourse. These include 
“fossil fuel companies,” “Israeli Palestinian conflict,” and 
“segregated culture houses.” For our cohort, such phrases 
are consistent with the Tufts University’s vocal emphasis 
on active citizenship and social entrepreneurship ( [Fortu- 
nato 2013] l. Our anonymous social media discourse evi¬ 
dently reflects university culture. Other phrases suggest re¬ 
curring mental health behavior - “beginning anti depres¬ 
sants,” “binge eating disorder,” “prescription ADD medica¬ 
tion” - as well as sexual health, that are perhaps not as pub¬ 
licly known or detailed. These may include ‘trigger words,’ 
phrases that may cause emotional reactions in victims of 
traumatic experience. 

Table [In] presents the results of our LDA discovery. We 
set the number of topics to be discovered (T = 17) by iter¬ 
atively running LDA until we found the most ‘interpretable’ 
topic distribution. Because of the subjectivity of topic in- 
terpretability, human judges must be experts of the textual 
























Topic label 

likelihood 

keywords 

Loneliness 

0.229 

want, people, care, sometimes, wish, alone, body, someone, talk, stop 

Social Dynamic 

0.147 

people, white, black, privilege, think, race, social, gay, world 

Time 

0.146 

year, time, freshman, first, week, every, still, last, yet, day, job 

Social Issues 

0.099 

many, social, isn’t, still, trying, others, issues, racism, without, understand 

Party 

0.073 

room, dance, floor, door, stop, walk, around, smoke, weed, finals 

Hook-up 

0.066 

attractive, hooked, interested, hooking, anyone, best, sexually, campus 

Attractive Individual 

0.046 

sexy, next, sitting, man, name, hair, gym, naked, dark, dining, ATO, tisch 

Domestic 

0.045 

money, kid, rich, sorority, grade, family, huge, enough, parent, bad, poor 

Bathroom 

0.0401 

penis, bathroom, toilet, poop, paper, clean, seat, put, house, step, take, pee 

Body 

0.0337 

pants, body, free, fat, drink, eat, obsess, thin, skinny, social, marriage 

Sexual assault 

0.0301 

rape, hate, sexual, culture, trigger, assault, ate, ATO 

Academic experience 

0.021 

liberal, major, conservative, absolutely, cry, turn, easy, homework, gotta, power 

Sexual Encounter 

0.020 

dick, frat, gave, public, cum, laid, guilty, stranger 


Table III: LDA topics in order of likelihood. 


domain. Thus, we identihed and labelled the topics with 
the help of the page administrators who were familiar with 
the style and content of the particular confessions page and 
could categorize each grouping of words as a ‘talked-about’ 
subject. For each such subject. Table III reports the most rel¬ 
evant keywords. 


Body ^g0.03 0.03 0.02 

0.08 0.02^^0.03 

0.08 0.06 

0.05 0.02 0.01 

Desire 0.01 ^^0.02 0.02 

0.08 0.02^^0.03 

0.08 0.07 

0.04 0.01 0.01 

Bathroom 0.010.02^^0.03 

0.07 0.02 |0.20| 0.04 

0.08 0.07 

0.05 0.01 0.02 

Attractive individual 0.01 0.03 0.04^^ 

0.07 0.02 |q.19| 0.03 

0.07 0.08 

0.06 0.01 0.02 

Social dynamic 0.01 0.03 0.03 0.02 

^^0.02^^241 0.03 

0.10 0.06 

0.03 0.02 0.01 

y Sexual assault 0.01 0.04 0.04 0.02 

0.08^^|a^0.03 

0.09 0.07 

0.04 0.01 0.01 

§* Loneliness 0.01 0.03 0.03 0.02 

0.10 0.02^^0.04 

0.09 0.07 

0.05 0.01 0.01 

o Domestic 0.01 0.03 0.03 0.02 

0.08 0.02 ^24|^^ 

0,09 0.09 

0.04 0.01 0.01 

Social issues 0.01 0.03 0.03 0.02 

0.12 0.02|q.26|0.04 

^^0.07 

0.04 0.02 0.01 

Time 0.01 0.03 0.03 0.03 

0.08 0.02 |0.2j 0.04 

0.07^^ 

0.05 0.01 0.02 

Party 0.01 0.04 0.03 0.03 

0.07 0.02 |0.20|0.03 

0.08 0.07^^0.01 0.01 

Academic experience 0.02 0.03 0.03 0.02 

0.10 0.Q3|0.2l|0.04 

0.10 0.06 0.04^1 0.01 

Hook-up 0.02 0.03 0.04 0.03 

0.07 0.03|0.2^0.04 

0.07 0.08 0.05 0.01^1 


from topic 
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Figure 2: Distribution of document sentiment polarity (< 
0.5 is negative sentiment), mean = 0.403. 


the dominant eigenvector has a convergent transition proba¬ 
bility of 0.752 and Social issues has the next highest value 
of 0.481. Under the assumption of multi-topic posts, a con¬ 
fession is most likely to steer towards these two topics. This 
may be interpreted as the most likely ‘direction’ of discourse 
from an anonymous confessor from this community. 


Figure 1: Topic transition probability matrix for all posts in 
“Tufts Confessions”. 


Discussion and Future Work 
Loneliness: a dominant theme 

With more than 22% overall likelihood of occurrence by 
keyword. Loneliness is the most recurring topic in our cor¬ 
pus - more than Sexual assault. Academic experience and 
Party combined. The resulting probability matrix of our 
Markov model as shown in Figure further corroborates 
this. Loneliness dominates in the global topic network as the 
most likely non-cyclic out-topic across the entire matrix. 

Using the power method to approximate the stationary 
topic distribution of a ‘random topic walk’. Loneliness in 


User behavior 

Is an online confessions community innately biased towards 
expressions of social distress such as Loneliness! Using sen¬ 
timent analysis ( |Pang and Lee 200^ , we scored each post in 
our corpus with a polarity measuring the weighted average 
sentiment of its keywords. Figure]^ shows the resulting dis¬ 
tribution plot with an average sentiment of 0.403, indicating 
that the average confessor is likely to express negative sen¬ 
timent. In particular, the presence of topics such as Sexual 
assault and Body suggest that this online space may be pop¬ 
ular for individuals who’ve undergone personal trauma to 
relate their negative experiences in the safety of anonymity. 
However, research on social network well-being by Burke 
et al. ( |Burke, Marlow, and Lento 2010| l found that direct 
communication with online friends correlate with decreased 
































feelings of loneliness. Despite the benefits of anonymous 
self-disclosure to a campus network, the removal of direct 
engagement may only worsen feelings of distress. It is dif¬ 
ficult to establish a causal relationship between the social 
medium and confessional behavior without further study to 
understand the role of anonymity. 

It is worth noting the overlap between Loneliness key¬ 
words and top scoring trigrams in Table [I] e.g. “want,” 
“wish”. Our students’ confession posts, in the majority, are 
egocentric requests centering on the personal, particularly 
on the complications of Loneliness. Even for specific polit¬ 
ical or social subject matter as suggested in Table they 
cater towards personal desire or experience. This may sug¬ 
gest that SNS users transmit uncontroversial, positive ego¬ 
centric expressions through status updates, while relaying 
contentious, negative egocentric expressions as nameless 
‘confessions’. 
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Social utility 

Sociological research finds that behavioral transitions dur¬ 
ing early college experiences are coupled with feelings of 
loneliness and isolation ([Shaver, Eurman, and Buhrmester| 


19851. The thematic prevalance of Loneliness in our corpus 


extends this notion to the Eacebook age; university students 
now use a common social media platform to report these 
feelings. Within undergraduate communities, first-year stu¬ 
dents are the most common cohort to express such isolation 
and loneliness based on studies of collegiate Eacebook be¬ 


havior (Kalpidou, Costin, and Morris 2011 


Compounded with the social and psychological bene¬ 
fits of confessional behavior, platforms that encourage hon¬ 
est personal disclosure can contribute to the ultimate well¬ 
being of users such as students transitioning into univer¬ 
sity life ( [Weiner et al. 1991) . With comparison and cross- 
validation between different pages, we hope to make find¬ 
ings - with geographic, temporal, and institutional dimen¬ 
sions - about university students across campuses. 


Conclusion 

In this paper, we examined “Tufts Confessions,” a sample of 
popular university campus confessions pages. We describe 
some characteristics of 24, 000 anonymously authored posts, 
revealing both expected syntactic trends and unique phrases 
that provide “cultural” context. To examine topic patterns in 
this novel corpus, we utilized Markov topic models to exam¬ 
ine our text corpora as a network of topic-word distributions. 
The discovery of Loneliness as a dominant topic of confes¬ 
sion supports and extends the sociology of loneliness in uni¬ 
versity student life. Euture work includes compiling a larger 
dataset of student confessions across different campus pages 
for further comparisons. 
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